Our project goal is to examine and understand the impact of COVID-19 on integral aspects of the U.S. economy such as GDP and Unemployment Rates; we chose these parameters as they are often used by economists to measure the current state of an economy.
We have first gathered COVID-19 health data recorded daily by the U.S. Center for Disease Control to understand which months of the pandemic faced the highest number of new cases and deaths. We utilized R to wrangle this data into monthly, national data and created bar plots to help us visualize the peaks and troughs in the pandemic timeline beginning with Jan ‘20 through Mar ‘21.
To study unemployment rates, we gathered Labor Force Statistics from the Current Population Survey Dataset from the U.S. Bureau of Labor Statistics. This dataset helped us understand the shifts in unemployment rates and the 1-monthly change in rates from Jan ‘20 to Mar ‘21. We mapped a timeline of the increase/decrease in cases and deaths along with the change in unemployment rates in order to infer a correlation. In creating these line charts, we calculated new cases and deaths as a proportional of the total (from Jan ‘20 to Mar ‘21) in order to place all three variables (unemployment rates, new cases, and deaths) on one axis.
In order to further understand economic impacts in specific regions of the U.S., we have focused our GDP analysis on four states with contrasting population sizes and responses to the pandemic. We specifically examined California (strong response to the pandemic, large population size), Texas (relaxed response to the pandemic, large population size), Rhode Island (strong response to the pandemic, small population size), and Utah (relaxed response to the pandemic, small population size). We also explored the GDP changes between various industries such as art, health, technical professions, real estate, and food to better understand which contributors to state GDP were impacted the most. In our analysis of GDP, we created an interactive map with data from 2019 and 2020 in order to compare pre and post pandemic GDP. We additionally created barplots that displayed the change in GDP from 2017 to 2020 for our four chosen states.
It has been saddening to study this data, quantifying the rise in cases and deaths and amount of individuals who lost their employment or seeing the downward slope of GDP, however, we hope our findings will help inform how government responses to the pandemic impacted local and state economies and the health of residents in their respective economy.
#Economy VisualizationThe global pandemic this past year has impacted many aspects of our normalcy and daily lives. We utilized the Twitter Developer API to better understand how people across the world were thinking about the current state of our economy, by gathering the 600 most recent tweets published with #economy and creating a word cloud to visualize our findings. The top words in the word cloud are “covid”, “business”, followed by “jobs”, “government”, and “globalhealth” to name a few.
We predict a linear correlation between unemployment rates and number of cases. As the number of COVID-19 cases increased, local and state governments entered lockdowns and further quarantine restrictions which impacted business and services, leading to higher levels of unemployment. In contrast, we hypothesize that an increasing number of deaths may not have had an equivalently strong correlation with unemployment rates. There are a few reasons for this: (1) cases spread and are recorded more quickly, (2) the number of cases is drastically higher than number of deaths, (3) there seemed to be a long-term misconception that only the elderly and immunocompromised (who might represent a smaller subset of the general population) were at high risk for severe loss due to the virus.
In terms of GDP per State, we have chosen California, Texas, Rhode Island, and Utah to examine in detail. We predict that California might have experienced an increase in GDP, due to the number of technology companies residing in the State of California who reported record amounts of sales during the shift to a remote, virtual world. We briefly researched this topic with the California State Assembly. California and Texas each have large populations and were at the top of the charts at distinct points of time for cases/deaths, but followed drastically contrasting responses to the pandemic in terms of lockdowns, mask mandates, etc. We hypothesize that California would have seen an increase or at least continuation of GDP from previous years whereas Texas might have experienced a decline. Rhode Island and Utah are closely related in population amounts, however, Utah was the one of the only states to not issue a lockdown order. Specific counties in Utah with larger populations or tourist sites did issue lockdown orders, whereas Rhode Island’s governor issued a stay at home order in late March itself. We briefly researched stay at home orders through the New York Times. We predict that Rhode Island will experience a steady GDP in 2020 whereas Utah might experience a decline due to tourist locations and highly populated areas (with the highest job density) were the only ones experiencing lockdowns.
| variable | description |
|---|---|
U.S. COVID-19 Cases |
Number of COVID-19 positive cases recorded each day in each state; this data has been wrangled into monthly national data from January 2020 to March 2021. |
U.S. COVID-19 Deaths |
Number of deaths from COVID-19 recorded each day in each state; this data has been wrangled into monthly national data from January 2020 to March 2021. |
U.S. Unemployment Rate per Month |
National recorded unemployment rates on a monthly basis from January 2020 to March 2021. |
U.S. GDP Percent Change per Year |
National monthly change in unemployment rates on a monthly basis from January 2020 to March 2021. |
Industry |
Arts, Health, Food, Real Estate, and Technical. |
States |
California, Texas, Rhode Island, and Utah. |
# load twitter library, and all other libraries
library(rtweet)
library(ggplot2)
library(dplyr)
library(tidytext)
# name of created app
appname <- "x"
## api key
key <- "x"
## api secret
secret <- "x"
# create token named "twitter_token"
twitter_token <- create_token(
app = appname,
consumer_key = key,
consumer_secret = secret,
access_token = 'x',
access_secret = 'x')
data <- search_tweets("#economy", n = 600, include_rts = FALSE)
head(data$text)
## [1] "Makinde Flags Off UN #Food Systems Summit https://t.co/qVc1p3pS9Y #blockchain #boove #business #culture #drones #economy #education #feedly #globalhealth #government #influencer #jobs #nature #news #niger #nigeria #oyo #politics #power #security #space"
## [2] "Ganduje Extends Teachers, Lecturers Retirement Age To 65 https://t.co/afcJK3mXIZ #blockchain #boove #business #covid19 #culture #drones #economy #education #featured #feedly #globalhealth #government #influencer #investment #jobs #kano #nature #news #niger #nigeria #power #space"
## [3] "Defence Secretary Ben Wallace’s Tribute To HRH The Duke Of Edinburgh https://t.co/bAi1EQAOGB #blockchain #boove #covid19 #culture #delta #drones #economy #education #feedly #globalhealth #government #influencer #jobs #nature #news #niger #power #space #ukgovernment"
## [4] "#Lagos Restates Commitment To Functional, Quality #Education https://t.co/1qLsxCkiPm #blockchain #boove #business #culture #drones #economy #feedly #globalhealth #government #influencer #jobs #nature #news #niger #power #space"
## [5] "#Lagos Pension Verification Exercise Now Online https://t.co/C43ttpsLIc #blockchain #business #covid19 #culture #drones #economy #education #feedly #globalhealth #government #health #influencer #jobs #nature #news #niger #nigeria #politics #power #security #space"
## [6] "Boko Haram Informant Arrested By Nigerian Army In Yobe https://t.co/Gxw48HWMBa #blockchain #boove #business #covid19 #culture #drones #economy #education #feedly #globalhealth #influencer #investment #jobs #nature #news #niger #power #space #tourism #travel #vintage"
# Load libraries
library(wordcloud)
library(RColorBrewer)
library(wordcloud2)
library(tm)
# Create a vector containing only the text
text <- data$text
# Create a corpus
docs <- Corpus(VectorSource(text))
# Clean data
docs <- docs %>%
tm_map(removeNumbers) %>%
tm_map(removePunctuation) %>%
tm_map(stripWhitespace)
## Warning in tm_map.SimpleCorpus(., removeNumbers): transformation drops documents
## Warning in tm_map.SimpleCorpus(., removePunctuation): transformation drops
## documents
## Warning in tm_map.SimpleCorpus(., stripWhitespace): transformation drops
## documents
docs <- tm_map(docs, content_transformer(tolower))
## Warning in tm_map.SimpleCorpus(docs, content_transformer(tolower)):
## transformation drops documents
docs <- tm_map(docs, removeWords, stopwords("english"))
## Warning in tm_map.SimpleCorpus(docs, removeWords, stopwords("english")):
## transformation drops documents
dtm <- TermDocumentMatrix(docs)
matrix <- as.matrix(dtm)
words <- sort(rowSums(matrix),decreasing=TRUE)
df <- data.frame(word = names(words),freq=words)
# Generate the word cloud
set.seed(1234)
wordcloud(words = df$word, freq = df$freq, min.freq = 1, max.words=200, random.order=FALSE, rot.per=0.35, colors=brewer.pal(8, "Dark2"))
Let’s take a look at the total counts of new COVID-19 cases and deaths recorded each month in the U.S. This data has been reported by the U.S. Center for Disease Control in their data set: “United States COVID-19 Cases and Deaths by State over TimeCase Surveillance”. We have cleaned and transformed daily COVID-19 reported data by state into national data organized by month.
## months dates newcases deaths
## 1 JAN20 2020-01-01 7 0
## 2 FEB20 2020-02-01 32 0
## 3 MAR20 2020-03-01 188138 3680
## 4 APR20 2020-04-01 875947 55007
## 5 MAY20 2020-05-01 725178 42053
## 6 JUN20 2020-06-01 847821 21634
## 7 JUL20 2020-07-01 1924412 28249
## 8 AUG20 2020-08-01 1470219 29184
## 9 SEPT20 2020-09-01 1217378 22327
## 10 OCT20 2020-10-01 1928056 24012
## 11 NOV20 2020-11-01 4400344 39397
## 12 DEC20 2020-12-01 6395806 79189
## 13 JAN21 2020-01-01 6098794 97095
## 14 FEB21 2020-02-01 2354530 63431
## 15 MAR21 2020-03-01 1773722 33041
Now, let’s look at how counts of new cases per month arose and fell from Jan ’20 to Mar ’21:
This bar plot helps us visualize a timeline for the spread of COVID-19. The red bars denote the top six months which experienced the highest number of recorded cases, often referred to as peaks or “second, third” waves. We can infer that July ’20, Oct ’20, Nov ’20, Dec ’20, Jan ’21, and Feb ’21 had the highest number of recorded cases; from this data, we might predict that unemployment rates would be higher during these months and GDP would be lower.
Now, let’s look at how counts of new deaths due to COVID-19 per month arose and fell from Jan ’20 to Mar ’21:
This bar plot reports the months which recorded the highest number of deaths due to COVID-19; the bars in red signify the top five months with the highest amount of deaths. These months include: Apr ’20, May ’20, Dec ’20, Jan ’21, and Feb ’21. In correlation with the highest number of cases recorded, the months of Dec ’20 through Feb ’21 had both the highest number of cases and deaths. It is surprising that April and May of 2020 had two of the highest number of deaths although they didn’t have a peak in cases. However, Apr/May were early months of the pandemic where we had little knowledge about the virus, and their average case count is around 800K which might have propelled the peak in July of 1.9 million cases.
library(magrittr)
library(dplyr)
library(ggplot2)
library(plotly)
library(lubridate)
library(grid)
library(gridExtra)
library("tidyverse")
ur.df <- read.csv("/Users/dishasrivastava/Documents/GitHub/economic-impact-of-covid19/Datasets/Unemployment-Monthly.csv")
covid19.health <- read.csv("/Users/dishasrivastava/Documents/GitHub/economic-impact-of-covid19/Datasets/United_States_COVID-19_Cases_and_Deaths_by_State_over_Time.csv")
# Dropping unused columns
ur.cleaned.df <- subset(ur.df, select = -c(Series.ID, Year, Period))
# Renaming columns
colnames(ur.cleaned.df) <- c("Time_Period", "Unemployment_Rate", "Monthly_Change")
#Result
ur.cleaned.df
## Time_Period Unemployment_Rate Monthly_Change
## 1 2020 Jan 3.5 -2.8
## 2 2020 Feb 3.5 0.0
## 3 2020 Mar 4.4 25.7
## 4 2020 Apr 14.8 236.4
## 5 2020 May 13.3 -10.1
## 6 2020 Jun 11.1 -16.5
## 7 2020 Jul 10.2 -8.1
## 8 2020 Aug 8.4 -17.6
## 9 2020 Sep 7.8 -7.1
## 10 2020 Oct 6.9 -11.5
## 11 2020 Nov 6.7 -2.9
## 12 2020 Dec 6.7 0.0
## 13 2021 Jan 6.3 -6.0
## 14 2021 Feb 6.2 -1.6
## 15 2021 Mar 6.0 -3.2
# Dropping unused columns
covh.cleaned <- subset(covid19.health, select = c(submission_date, state, tot_cases, new_case, tot_death, new_death))
# Renaming columns
colnames(covh.cleaned) <- c("Date_Reported", "State", "Total_Cases", "New_Cases", "Total_Death", "New_Death")
# https://cran.r-project.org/web/packages/lubridate/vignettes/lubridate.html
covh.cleaned$Date_Reported <- mdy(covh.cleaned$Date_Reported) # returns year/month/day
# Result
head(covh.cleaned$Date_Reported)
## [1] "2021-04-01" "2020-10-15" "2021-03-16" "2021-04-16" "2020-02-14"
## [6] "2020-08-08"
# https://blog.exploratory.io/filter-with-date-function-ce8e84be680
# JANUARY 2020
jan20.covid.df <- (covh.cleaned %>%
select(Date_Reported, State, Total_Cases, New_Cases, Total_Death, New_Death) %>%
filter(Date_Reported <= as.Date("2020-01-31")))
jan20.newcases <- sum(jan20.covid.df$New_Cases)
jan20.deaths <- sum(jan20.covid.df$New_Death)
# FEBRUARY 2020
feb20.covid.df <- (covh.cleaned %>%
select(Date_Reported, State, Total_Cases, New_Cases, Total_Death, New_Death) %>%
filter(Date_Reported >= as.Date("2020-02-01") & Date_Reported <= as.Date("2020-02-28")))
feb20.newcases <- sum(feb20.covid.df$New_Cases)
feb20.deaths <- sum(feb20.covid.df$New_Death)
# MARCH 2020
mar20.covid.df <- (covh.cleaned %>%
select(Date_Reported, State, Total_Cases, New_Cases, Total_Death, New_Death) %>%
filter(Date_Reported >= as.Date("2020-03-01") & Date_Reported <= as.Date("2020-03-31")))
mar20.newcases <- sum(mar20.covid.df$New_Cases)
mar20.deaths <- sum(mar20.covid.df$New_Death)
# APRIL 2020
apr20.covid.df <- (covh.cleaned %>%
select(Date_Reported, State, Total_Cases, New_Cases, Total_Death, New_Death) %>%
filter(Date_Reported >= as.Date("2020-04-01") & Date_Reported <= as.Date("2020-04-30")))
apr20.newcases <- sum(apr20.covid.df$New_Cases)
apr20.deaths <- sum(apr20.covid.df$New_Death)
# MAY 2020
may20.covid.df <- (covh.cleaned %>%
select(Date_Reported, State, Total_Cases, New_Cases, Total_Death, New_Death) %>%
filter(Date_Reported >= as.Date("2020-05-01") & Date_Reported <= as.Date("2020-05-31")))
may20.newcases <- sum(may20.covid.df$New_Cases)
may20.deaths <- sum(may20.covid.df$New_Death)
# JUNE 2020
jun20.covid.df <- (covh.cleaned %>%
select(Date_Reported, State, Total_Cases, New_Cases, Total_Death, New_Death) %>%
filter(Date_Reported >= as.Date("2020-06-01") & Date_Reported <= as.Date("2020-06-30")))
jun20.newcases <- sum(jun20.covid.df$New_Cases)
jun20.deaths <- sum(jun20.covid.df$New_Death)
# JULY 2020
jul20.covid.df <- (covh.cleaned %>%
select(Date_Reported, State, Total_Cases, New_Cases, Total_Death, New_Death) %>%
filter(Date_Reported >= as.Date("2020-07-01") & Date_Reported <= as.Date("2020-07-31")))
jul20.newcases <- sum(jul20.covid.df$New_Cases)
jul20.deaths <- sum(jul20.covid.df$New_Death)
# AUGUST 2020
aug20.covid.df <- (covh.cleaned %>%
select(Date_Reported, State, Total_Cases, New_Cases, Total_Death, New_Death) %>%
filter(Date_Reported >= as.Date("2020-08-01") & Date_Reported <= as.Date("2020-08-31")))
aug20.newcases <- sum(aug20.covid.df$New_Cases)
aug20.deaths <- sum(aug20.covid.df$New_Death)
# SEPTEMBER 2020
sep20.covid.df <- (covh.cleaned %>%
select(Date_Reported, State, Total_Cases, New_Cases, Total_Death, New_Death) %>%
filter(Date_Reported >= as.Date("2020-09-01") & Date_Reported <= as.Date("2020-09-30")))
sep20.newcases <- sum(sep20.covid.df$New_Cases)
sep20.deaths <- sum(sep20.covid.df$New_Death)
# OCTOBER 2020
oct20.covid.df <- (covh.cleaned %>%
select(Date_Reported, State, Total_Cases, New_Cases, Total_Death, New_Death) %>%
filter(Date_Reported >= as.Date("2020-10-01") & Date_Reported <= as.Date("2020-10-31")))
oct20.newcases <- sum(oct20.covid.df$New_Cases)
oct20.deaths <- sum(oct20.covid.df$New_Death)
# NOVEMBER 2020
nov20.covid.df <- (covh.cleaned %>%
select(Date_Reported, State, Total_Cases, New_Cases, Total_Death, New_Death) %>%
filter(Date_Reported >= as.Date("2020-11-01") & Date_Reported <= as.Date("2020-11-30")))
nov20.newcases <- sum(nov20.covid.df$New_Cases)
nov20.deaths <- sum(nov20.covid.df$New_Death)
# DECEMBER 2020
dec20.covid.df <- (covh.cleaned %>%
select(Date_Reported, State, Total_Cases, New_Cases, Total_Death, New_Death) %>%
filter(Date_Reported >= as.Date("2020-12-01") & Date_Reported <= as.Date("2020-12-31")))
dec20.newcases <- sum(dec20.covid.df$New_Cases)
dec20.deaths <- sum(dec20.covid.df$New_Death)
# JANUARY 2021
jan21.covid.df <- (covh.cleaned %>%
select(Date_Reported, State, Total_Cases, New_Cases, Total_Death, New_Death) %>%
filter(Date_Reported >= as.Date("2021-01-01") & Date_Reported <= as.Date("2021-01-31")))
jan21.newcases <- sum(jan21.covid.df$New_Cases)
jan21.deaths <- sum(jan21.covid.df$New_Death)
# FEBRUARY 2021
feb21.covid.df <- (covh.cleaned %>%
select(Date_Reported, State, Total_Cases, New_Cases, Total_Death, New_Death) %>%
filter(Date_Reported >= as.Date("2021-02-01") & Date_Reported <= as.Date("2021-02-28")))
feb21.newcases <- sum(feb21.covid.df$New_Cases)
feb21.deaths <- sum(feb21.covid.df$New_Death)
# MARCH 2021
mar21.covid.df <- (covh.cleaned %>%
select(Date_Reported, State, Total_Cases, New_Cases, Total_Death, New_Death) %>%
filter(Date_Reported >= as.Date("2021-03-01") & Date_Reported <= as.Date("2021-03-31")))
mar21.newcases <- sum(mar21.covid.df$New_Cases)
mar21.deaths <- sum(mar21.covid.df$New_Death)
# Creating a variable for months
months <- c("JAN20", "FEB20", "MAR20", "APR20", "MAY20", "JUN20", "JUL20", "AUG20", "SEPT20", "OCT20", "NOV20", "DEC20", "JAN21", "FEB21", "MAR21")
# Creating a variable for new cases
newcases <- c(jan20.newcases, feb20.newcases, mar20.newcases, apr20.newcases, may20.newcases, jun20.newcases, jul20.newcases, aug20.newcases, sep20.newcases, oct20.newcases, nov20.newcases, dec20.newcases, jan21.newcases, feb21.newcases, mar21.newcases)
# Creating a variable for deaths
deaths <- c(jan20.deaths, feb20.deaths, mar20.deaths, apr20.deaths, may20.deaths, jun20.deaths, jul20.deaths, aug20.deaths, sep20.deaths, oct20.deaths, nov20.deaths, dec20.deaths, jan21.deaths, feb21.deaths, mar21.deaths)
# Time series
dates <- c("01/01/2020", "02/01/2020", "03/01/2020","04/01/2020", "05/01/2020","06/01/2020", "07/01/2020", "08/01/2020", "09/01/2020", "10/01/2020","11/01/2020", "12/01/2020", "01/01/2021", "02/01/2021", "03/01/2021")
# COVID-19 health per month dataframe
monthly.covid.df <- data.frame(months, dates, newcases, deaths)
monthly.covid.df <- monthly.covid.df %>%
mutate(dates = as.Date(dates, format = "%m/%d/%y"))
# Result
# grid.draw(tableGrob(monthly.covid.df, theme=ttheme_default(base_size = 6) ))
head(monthly.covid.df)
## months dates newcases deaths
## 1 JAN20 2020-01-01 7 0
## 2 FEB20 2020-02-01 32 0
## 3 MAR20 2020-03-01 188138 3680
## 4 APR20 2020-04-01 875947 55007
## 5 MAY20 2020-05-01 725178 42053
## 6 JUN20 2020-06-01 847821 21634
# Find the top 5 highest months
head(arrange(monthly.covid.df,desc(newcases)), n = 5)
# Color labels
top5.cases <- c("#69b3a2","#69b3a2","#69b3a2","#69b3a2","#69b3a2","#69b3a2",
"#B20000","#69b3a2","#69b3a2","#B20000","#B20000","#B20000","#B20000","#B20000","#69b3a2")
# Case labels
num.cases.mils <- c("7", "32", "188K", "875K", "725K","847K", "1.9m", "1.4m", "1.2m", "1.9m", "4.4m", "6.3m", "6m", "2.3m", "1.7m")
# Visualize
b.cases <- barplot(newcases, yaxp=c(0, max(newcases), 15),
ylim=range(pretty(c(0, newcases))), names=months, xlab = "Time Period", ylab = "Number of New Recorded Cases", main = "Number of Newly Recorded COVID-19 Cases per Month in the U.S. 20-21", col = top5.cases)
y<-as.matrix(newcases)
text(b.cases, y+300000,labels=num.cases.mils)
# Find the top 5 highest months
head(arrange(monthly.covid.df,desc(deaths)), n = 5)
# Color labels
deaths.top5 <- c("#90bcff","#90bcff","#90bcff","#B20000","#B20000","#90bcff",
"#90bcff","#90bcff","#90bcff","#90bcff","#90bcff","#B20000","#B20000","#B20000","#90bcff")
# Case labels
num.deaths.mils <- c(0, 0, '3,680', '55K', '42K,', '21K', '28K', '29K', '22K', '24K', '39K', '79K', '97K', '63K', '33K')
# Visualize
b.deaths <- barplot(deaths, yaxp=c(0, max(deaths), 15),
ylim=range(pretty(c(0, deaths))), names=months, xlab = "Time Period", ylab = "Number of New Recorded Deaths", main = "Number of Newly Recorded COVID-19 Deaths per Month in the U.S. 20-21", col = deaths.top5)
y<-as.matrix(deaths)
text(b.deaths ,y+2000, labels=num.deaths.mils)
The table below summarizes the unemployment rates and monthly change in unemployment per month from Jan ’20 to Mar ’21. New cases and deaths per month have been calcuated as a proportion of the total cases and deaths from Jan ’20 to Mar ’21 in order to create a line chart with all three variables.
## Date Unemployment_Rate Monthly_Change percentcases percentdeaths
## 1 JAN20 3.5 -2.8 0.00002317851 0.0000000
## 2 FEB20 3.5 0.0 0.00010595892 0.0000000
## 3 MAR20 4.4 25.7 0.62296558878 0.6836349
## 4 APR20 14.8 236.4 2.90044987507 10.2186703
## 5 MAY20 13.3 -10.1 2.40122112355 7.8122010
## 6 JUN20 11.1 -16.5 2.80731860893 4.0189560
## 7 JUL20 10.2 -8.1 6.37214414227 5.2478270
## 8 AUG20 8.4 -17.6 4.86821293398 5.4215222
## 9 SEPT20 7.8 -7.1 4.03100172501 4.1476949
## 10 OCT20 6.9 -11.5 6.38421021402 4.4607179
## 11 NOV20 6.7 -2.9 14.57049022953 7.3187949
## 12 DEC20 6.7 0.0 21.17789628105 14.7109692
## 13 JAN21 6.3 -6.0 20.19442534241 18.0373733
## 14 FEB21 6.2 -1.6 7.79635782115 11.7835998
## 15 MAR21 6.0 -3.2 5.87317697682 6.1380385
Now, let’s take a look at correlations between unemployment rates, cases, and deaths:
In this line chart, dark red represents unemployment rate, dark blue represents percent contribution of new cases , and finally, dark green represents percent contribution of new deaths.
This graph helps us visualize the correlation between unemployment rates and the spread of COVID-19 in terms of newly recorded cases and deaths each month from Jan ‘20 to Mar ‘21. In the beginning of the timeline, we see stagnancy in unemployment during Jan ‘20 through Feb ‘20, as the pandemic had not yet reached the United States (although we had 39 cases, it had not reached major headlines yet and people were mostly unaware of the gravity of the virus, and we were at 0 deaths). Mar ‘20 is when COVID-19 hit the news headlines, cases shot up, and we experienced our first brutal month of deaths from this pandemic. The graph visualizes this categorial phenomenon, as Apr ‘20 had the highest recorded unemployment rate of 14.8% due to lockdowns and quarantine orders affecting businesses nationwide. We have studied which industries were hit the hardest further down in our analysis. Apr ‘20 through Jul ‘20 we experienced a fluctuating increase and decrease in the number of new cases and deaths, with unemployment numbers climbing and falling in sync. The timeline from Sept ‘20 to Jan ‘21 is especially surprising, as the number of cases and deaths hit their highest peaks yet unemployment rates increased slightly and stayed mostly continuous on a downward slope. This may be due to most local and state governments easing lockdown restrictions and as a result most businesses reopening with a better understanding of how to operate safely (outdoor dining, sanitization routines, plexiglass dividers, etc.). Overall, we can infer from this line chart that in the early stages of the pandemic there was a correlation between number of cases and deaths and unemployment rate, as cases/deaths informed decisions by government officials for safety procedures such as lockdowns; however, in the late stages of the pandemic, we learned how to live with the virus and our business adapted new ways to keep customers safe in order to reopen after a devastating year of closures which lowered unemployment rates although cases/deaths were still climbing. This might also infer that we opened our economy too soon, as Oct ‘20 to Jan ‘21 is when lockdown restrictions eased and cases/deaths climbed drastically, but that is a debate for another day.
Now, let’s take a look at correlations between monthly percent change in unemployment rates, cases, and deaths to confirm our graph from above:
In this line chart as well, dark red represents unemployment rate, dark blue represents percent contribution of new cases , and finally, dark green represents percent contribution of new deaths.
From the timeline developed by this line chart we can infer three notable changes: first, the initial spike in unemployment from Feb ‘20 to Mar ‘20; second, the peak of unemployment change from Mar ‘20 to Apr ‘20; and finally, third, the period of constant fluctuation in changes to unemployment levels from May ‘20 to Mar ‘21. Looking at the monthly changes in unemployment helps us confirm our earlier findings: unemployment rates are directly correlated with new cases and deaths in the early stage of the pandemic, but are unstable in later stages. In Apr ‘20 almost every local and state government in the U.S. entered lockdowns, creating the initial spike in unemployment rates and the large monthly change from Mar ‘20 to Apr ‘20. In the period of May ‘20 through Mar ‘21, individual local and state governments have had varying responses to the pandemic in terms of lockdowns, mask mandates, and reopening guidelines, which makes it difficult to map a constant pattern in unemployment rates during this time period (which we might consider the later stages of the pandemic). In order to look in more detail at the economic impact of COVID-19 on specific regions with specific pandemic responses, let’s now take a look at GDP by state and industry.
# Create a new unemployment dataframe
ur.viz <- ur.cleaned.df
# Renaming columns
colnames(ur.viz) <- c("Date", "Unemployment_Rate", "Monthly_Change")
# Cleaning and wrangling time/date data
ur.viz$Date <- months
ur.viz$Date <- factor(ur.viz$Date, levels = c("JAN20", "FEB20", "MAR20", "APR20", "MAY20", "JUN20", "JUL20", "AUG20", "SEPT20", "OCT20", "NOV20", "DEC20", "JAN21", "FEB21", "MAR21"))
# Add COVID-19 cases data, calculate contribution percentage from each month to total cases
options(scipen = 999)
vector = c()
for (i in 1:length(newcases)) {
vector <- c(vector, (newcases[i]/sum(newcases)*100))
formatC(vector[i], digits = 1)
}
ur.viz$percentcases <- vector
# Add COVID-19 deaths data, calculate contribution percentage from each month to total deaths
vector.d = c()
for (i in 1:length(deaths)) {
vector.d <- c(vector.d, (deaths[i]/sum(deaths)*100))
formatC(vector.d[i], digits = 1)
}
ur.viz$percentdeaths <- vector.d
# Result
head(ur.viz)
## Date Unemployment_Rate Monthly_Change percentcases percentdeaths
## 1 JAN20 3.5 -2.8 0.00002317851 0.0000000
## 2 FEB20 3.5 0.0 0.00010595892 0.0000000
## 3 MAR20 4.4 25.7 0.62296558878 0.6836349
## 4 APR20 14.8 236.4 2.90044987507 10.2186703
## 5 MAY20 13.3 -10.1 2.40122112355 7.8122010
## 6 JUN20 11.1 -16.5 2.80731860893 4.0189560
# Creating the graph
ggplot(ur.viz, aes(x=Date, group = 1)) + geom_line(aes(y = Unemployment_Rate, group = 1), color = "darkred") + geom_line(aes(y = percentcases, group = 1), color="orange", linetype="twodash") + geom_line(aes(y = percentdeaths, group = 1), color="darkgreen", linetype="twodash") + labs(x = "Time Period", y = "Percentage", caption="Source: Center for Disease Control (CDC) and U.S. Bureau of Economic Analysis (BEA)") + ggtitle("Unemployment Rate Trends in the U.S. 2020-21")
ggplot(ur.viz, aes(x=Date, group = 1)) + geom_line(aes(y = Monthly_Change, group = 1), color = "darkred") + geom_line(aes(y = percentcases, group = 1), color="orange", linetype="twodash") + geom_line(aes(y = percentdeaths, group = 1), color="darkgreen", linetype="twodash") + labs(x = "Time Period", y = "Change Percentage", caption="Source: Center for Disease Control (CDC) and U.S. Bureau of Economic Analysis (BEA)") + ggtitle("Monthly Change in Unemployment Rates in the U.S. 2020-21")
## State_Name_Col State_2020_Rank Year_Col GDP_Change_Col
## 1 California 15 Y2017 4.3
## 2 California 15 Y2018 3.1
## 3 California 15 Y2019 3.4
## 4 California 15 Y2020 -2.8
## 5 Texas 23 Y2017 2.8
## 6 Texas 23 Y2018 3.9
## 7 Texas 23 Y2019 2.9
## 8 Texas 23 Y2020 -3.5
## 9 RhodeIsland 38 Y2017 -0.6
## 10 RhodeIsland 38 Y2018 0.8
## 11 RhodeIsland 38 Y2019 1.0
## 12 RhodeIsland 38 Y2020 -4.5
## 13 Utah 1 Y2017 4.4
## 14 Utah 1 Y2018 5.6
## 15 Utah 1 Y2019 3.8
## 16 Utah 1 Y2020 -0.1
This is a set of four bar plots that display the percent change in GDP for California, Texas, Rhode Island, and Utah. Each graph includes data from one of the states mentioned above and displays the change in GDP from 2017 to 2020. Each bar in the chart has been filled with red or green to indicate whether there was an increase or decrease in GDP percentage.
This bar plot displays the change in GDP for the state of California. We can conclude that the state had a very high starting percentage and the value dropped slighlty in 2018 and 2019. However, the most drastic drop in GDP percentage happened in 2020 with a value of -2.3%.
This bar plot illustrates the GDP percent change in Texas. There is a starting percent change of roughly 2.5% and it significantly increased in 2018, but then dropped in 2019. Texas also had a significant decrease in 2020 with a value of -4.5%. This value is almost twice as low as California.
This graph displays the GDP change in Rhode Island from 2017 to 2020. This visualization is unique because there already was a GDP decline in 2017. Also, the GDP values for 2018 and 2019 were not as high as the other states. Lastly, Rhode Island had the highest drop in GDP for 2020 with a value of -4.5%.
This last graph shows the change in GDP for the state of Utah. This state had the highest GDP values for 2017, 2018, and 2019. It is also important to note that there was a very small decrease in 2020.
Explore the Percent Change in GDP by State in 2019
This is an interactive map of the United States where each state is colored in a green gradient based on the GDP percent change. We can conclude that the majority of states had a GDP value between 2 and 5 percent. It is interesting to note that there are no states with negative percentages because this was before the pandemic happened.
Explore the Percent Change in GDP by State in 2020